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Abstract: 

We build confidence balls for the common density s of a real valued sample 
Xi,...,Xn- We use resampling methods to estimate the projection of s onto 
^D ' finite dimensional linear spaces and a model selection procedure to choose an 

optimal approximation space. The covering property is ensured for all n > 2 
and the balls are adaptive over a collection of linear spaces. 
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jy^ '. 1 Introduction 

-4— > , In this paper, we discuss the problem of adaptive confidence bahs, from a non-asymptotic point 

of view, in the particular context of density estimation. Let 5 be a set of densities with respect 
to the Lebesgue measure /x on R. Given an i.i.d sample Xi-n = {Xi, ...,Xn) and a confidence 
level (3 G (0, 1), a confidence set (hereafter CS) i?^(Xi:„) on 5 is a subset of S satisfying the 

. , following covering property: 

00 ; / - \ 

(N. Vsg5, PJse5^(Xi;„)) >l-/3 (1) 

in ■ ^ ^ 

• ■ where, for all s in S, Pg denotes the distribution of Xi-n when the marginals have common 

L^ , density s. All the CS considered in this paper are L'^-balls, centered on estimators s of s, and 

Q I with random radius pjs. The quality of a CS is measured with the quantiles of pjs. We are 

looking for adaptive CS, which means that, given a collection {Sm)meM„ of subsets of S, pp 
should be as small as possible over all the sets {Sm)m<^M„- 
k>( \ This problem was mostly considered in regression frameworks, see among others Li [25], Lepski 

5h ' [23]) Juditski &; Lepski [20J, Hoffmann &: Lepski [H], Juditski &: Lambert-Lacroix [lU], Baraud 

[1], Beran |5j, Beran &: Diimbgen [6j, Cai & Low [9], Genovese & Wassermann [121 [13]. Robins 
& van der Vaart [28J considered a more general Hilbertian framework that includes in particular 
density estimation and some regression frameworks. 

Our adaptive balls are derived from a model selection procedure, which is essentially the one 
of Baraud jl]. We start with a collection of linear spaces {Sm)m<^Mu ^-nd associate to each of 
these, the projection estimator Sm of s and some positive number p{m). The p{m)^s are suitably 
calibrated to satisfy the property that, with probability close to one the distance between s and 
its projection estimator Sm is not larger than p{m). We then select rh as the minimizer of p{m) 
and define the confidence ball as the L^-ball centered at Sm of radius p{rh). 
We use two different ingredients to compute p{m). The first one is a resampling estimator of 
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\\sm — SmW^, where Sm denotes the projection of s onto Sm- It is naturahy derived from Efron's 
heuristic (see Efron [10]), in the same way as Arlot, Blanchard &: Roquain f2]. This allows us 
in particular to keep all the sample to build Sm- This is an improvement compared with Robins 
& van der Vaart [28] or Cai & Low [9], who cut the sample into two parts, the first one being 
used to build an estimator s of s and the other to evaluate the distance p — s|p. 
The second ingredient is an estimator of ||s — SmP; based on U-statistics, as in Laurent [2HI22J. 
The proofs are handled thanks to a concentration inequality for [/-statistics, derived from 
Houdre & Reynaud-Bouret [15] • The main advantage of a model selection's approach is that 
the resulting CS are non asymptotic, i.e. ([T|) holds for all n. Moreover, the CS behaves well 
even if s does not belong to S, which outperforms, in that case, the result of Li |25j . 
Let S" be a linear space with dimension d and let {Sm)mGMn be a collection of linear subspaces 
of S, with respective dimensions {dm)mGM„- The diameter of our CS on S is upper bounded, 
for any s in Sm, by C^vdV dm)/'^, where C is a constant, free from d, dm, and n. This bound is 
optimal in the minimax sense. Hence, adaptation is possible over collections of subspaces with 
dimension dm > Vu for L^-balls. This positive result does not hold in general, in particular, 
adaptation is impossible for I/°°-balls (Low [26]). However, the adaptation property is strongly 
limited since it is impossible over spaces with dimension dm < yd- This negative result was 
already proved asymptotically in Li |25j . Hoffmann & Lepski J14j . Juditski &: Lambert-Lacroix 
[19j, Robins & van der Vaart [28]. It was proved non-asymptotically in a regression framework 
in Baraud [1]. We use the method of Baraud [4j and extend his result to the density estimation 
framework. 

The paper is decomposed as follows. Section [2] introduces the notations and the main assump- 
tions. Section [3] presents the technical tools required for the construction of our CS. Section |3] 
gives the main results, we build our CS, give upper bounds on their size and prove their opti- 
mality in the minimax sense. Section [J] presents a short simulation study, where we illustrate 
the behavior of our resampling-based estimators. All the proofs are postponed to Section [6l We 
add in an Appendix the proofs of some technical lemmas. 

2 Notations and assumptions 

2.1 Notations 

Hereafter, L^(/i) denotes the space of all measurable functions t : M — )• M such that J^ t'^{x)dfi{x) < 
oo. It is endowed by its classical scalar product defined, for all t, t' in L'^{ij-) by < t,t' >= 
J^t{x)t' {x)dfi{x) and by the associated L^-norm defined, for t in L'^{fJ-) by ||t|| = ^/< t,t >. 
For any density s, we denote by Fg the distribution of an iid sample Xim = {Xi, ...,X„) with 
common marginal density s and by E^ the expectation with respect to Pg. 
Hereafter, S, with various subscripts, denotes a linear subspace of L'^{p) and S* the set of 
densities in S. For all sets F in L^(/i), the L^-diameter of F is defined by 

A(J') = sup \\t-t'\\. 
(t,t')e.F2 

For a random set B in L?'{ii), a linear space S of measurable functions and a real number a in 
(0, 1), we define the (5, a)-size of B as 

^{S,a){B) = inf [5 > 0, sup ¥s{A{B) > 6) < a] . (2) 

For all indexes sets A, (^A)AeA will always denote an orthonormal system in L^(/i). 



2.2 Efron's resampling heuristic 

Let X, Xi, ...,Xn be i.i.d random variables with common density s, let Pg and P„ denote the fol- 
lowing processes defined respectively for all functions t in L^{fJ.) and for all measurable functions 
tby 

r 1 " 

Pgt=<s,t>= / t(x)s(x)dn(x) =E(t(X)), Pnt = -y^t(Xi). 

Jr "tt 

Hereafter, a resampling scheme {Wi, ..., Wn) is a vector of real valued random variables, inde- 
pendent of {Xi, ....,Xn) and exchangeable, which means that, for all permutations r of 1, ...,n, 

(VFt-(i), ..., WT-(n)) has the same law as {Wi, ..., Wn)- 

Let {Wi,...,Wn) be a resampling scheme, let Wn = Y^^i=iWi/n and let P^ denotes the 
resampling-based empirical process defined, for all measurable functions t, by 

1 " 

P^t = -Y,WAX^). 
i=l 

For all random variables F{Xi, ..., X„, Wi^ ..., Wn), we denote by 

Ew{F{Xi,...,Xn,Wi,...,Wn))=E{F{Xi,...,Xn,Wi,...,Wn)\Xi,...,Xn). 

Let -F be a known functional and Fn = F{Pn, Ps), we define the resampling estimator of -F„ by 

Ff = CwEw {FiP^, WnPn)) , 

where Cw is a constant depending only on the functional F and the law of the resampling 
scheme. Efron's heuristics states that F„ provides a sharp estimator of -F„ when the constant 
Cw is well chosen. 

2.3 Balls in functional spaces 

Our method is strongly based on empirical process methods, in particular on Talagrand's con- 
centration inequality. This inequality involves some L°^-norms, this is why we introduce the 
following notations. Let S* be a linear space of measurable functions. For any function t in 
L^(yu) nL°^(^), let iTs{t) denote its orthogonal projection onto S, let \\t\\^ be its L°^-norm. For 
ah C, C, 7] in M+, for aU t in L'^{fi), let 

B2{t, C, S) = {t' G S, \\t' - t\\ < C}, B{S) = ^2(0, 1, S) = {t£ S, \\t\\ < 1} . (3) 

B2,ooiC,C',r],S) = {tGL\p)nL'^ifi), \\t\\<C, ||t|L < C, ||t - 7r5(t)|| < r/} . (4) 

2.4 Basic definitions 

Definition 2.1. (Confidence Sets) 

Let {Xi, ...,Xn) he an i.i.d. sample of real valued random variables, let S C L'^ilJ-) and let /3 
be a real number in (0, 1). The set CS{S,/3) of (1 — (3)-confidence balls on S is defined as the 
collection of all subsets Bp = B2{s,pp,S) of L?'{ii), where s and pp are measurable with respect 
to (t(Xi, ...,Xn) such that 

VsG S\ ¥jseBp\ > 1-/3. 



Definition 2.2. (Minimax rate of convergence for confidence sets) 

Let {Xi, ...,Xn) be an i.i.d. sample of real valued random, variables, let S' C S C L/^ip) and let 
a, /3 be real numbers in (0, 1). The [a, (3) -minimax rate of convergence over S' for CS on S is 
defined as 

4>n{a,/3,S,S')= ^ inf A(^s',a){Bi3)- 

Definition 2.3. (Adaptive confidence sets) 

Let {Xi, ..., Xn) be an i.i.d. sample of real valued random variables, let S C L'^{fi), let {Sm)m<^Mn 

be a collection of subsets of S and let a, (3 be real numbers in (0,1). A CS Bp in CS{S,f3) is 

said to be optimal, or adaptive over {Sm)meMnJ if the following condition holds. 

For all fixed a in (0, 1), there exists a constant c{a, /3) > free from n, S and {Sm)meMn such 

that, for all m in M.n, 

A5„,q(-B/3) < c(a,/3)(/>n(a, /3,5,5m) 

Definition 2.4. (Test) 

Let [Xi^ ...,Xn) be an i.i.d. sample of real valued random variables. Let S be a family of 
densities on M. Let Sq, Si be two disjoint subsets in S. A test T of the assumption Hq : s G 5o 
against the alternative Hi : s G 5i is a function T : M" — )■ {0, 1}. The test T is said to have a 
confidence level 1 — a G (0, 1) when 

VsG5o, Fs(T{Xi,...,Xn) = 0)>l-a. 

Lt is said to have a power 1 — /3 G (0, 1) when 

Vsg5i, P,(T(Xi,...,X„) = l)>l-/3. 

2.5 Main Assumptions 

Let {Sm)m£M„ be a collection of linear subspaces of L'^{fi), with finite dimensions respectively 

denoted by {dm)meM„- We make the following assumptions on this collection. 

HI: There exists m„ in ^An such that 5m„ = Span (UmeA4 ^m)- 

H2: There exists a constant Ci such that, for all m in Mn, for all t in Sm 



t\\^. ^ C 1 a/ dm t 
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The last assumption is only technical and let us simplify the results. Let /3 be a real number in 

(0,1). 

H3(A^,/3): For all n > 2 Nn = Card(A^„) is finite and there exists a constant Cm such that, 

for all n > 2, 

2^/d;:in(6iV„//3) 



< Cm- 

n 

Four examples are usually developed as fulfilling this set of assumptions: 

[Hist] regular histogram spaces: for all tti in N*, Sm is the space of all the functions constant 

on the partition (/[fc/m,(fc+i)/m))fc=o,...,m-i of [0, 1], dm = m. 

[T] trigonometric spaces: Sm is the linear span of the functions ipofi{x) = l[o,i]5 ipj,iix) = 

-v/2cos(27rjx)l[o,i](x) and '4>j^2{x) = \/2sin(27rjx)l[o^i](x) for all 1 < j < Jm- dm = 2Jm + 1- 

[P] regular piecewise polynomial spaces: Sm is the linear span of the functions (ipj^k) for j = 

1, ..., Jm, k = 0, ..., r — 1, where, for all j = 1, ..., Jm and k = 0, ..., r — 1, Vj^fc is a polynomial of 

degree k on [{j - l)/Jm,j/Jm]- dm = rJm- 

[W] spaces spanned by dyadic wavelets with regularity r. 

We have to choose dm^ ^ Cn? /(hi n)^ and /3 > n~^ for some r > in order to fulfill Assumption 



H3(A^,/3). For a description of those spaces and their properties, we refer to Birge &; Massart 
[7j. Hereafter, in order to simphfy the notations, we will often write 5„, dn, Sn,--- instead of 



■'mni ""rrini ''m„r 



3 Technical tools 

This section presents the results required in Section |4] to build our adaptive confidence sets. Let 
s be a density in -L^(//) and let Sm and s„ denote respectively its orthogonal projections onto 
the linear spaces Sm and 5„, where Sm C 5„. We recall the definition and some basic properties 
of the projection estimator Sm of s on Sm in Section 13. 1[ From Pythagoras theorem, it satisfies 

II _ - ||2 _ II _ ||2 I II _ ||2 _i_ II _ ' ||2 (r:\ 

Section [32] deals with the estimation of ||sm, — SmP- We introduce our resampling estimator and 
state a very important concentration inequality (Theorem 13. 3p . In Section [331 '^s introduce 
our estimator of ||s„ — s^lP based on [/-statistics. 

3.1 Projection estimators 

Definition 3.1. (projection estimators) 

Let Xi,...,Xn be i.i.d random variables with common density s in L^{fj.). Let Sm be a linear 

subspace of L^ (//) . The projection estimator of s on Sm is defined by 

Sm = inf ||t|p — 2Pnt. 

Classical computations show the following Lemma: 

Lemma 3.2. Let Xi, ...,Xn be i.i.d random variables with common density s in L'^{fi). Let Sm 
be a linear subspace of L'^{fj,) and let {ip\)xeAm ^^ ^''^ orthonormal basis of Sm- Let Sm be the 
orthogonal projection of s onto Sm o-nd let Sm be the projection estimator of s onto Sm- Then, 

Sm= ^ {PsiJx)^P\, Sm = ^ (P„Va)V'A, \\Sm - Sm\\^ = ^ [{Pn - Ps)i^xf - 
AeAm AeAm AeAm 

3.2 Estimation of \\sm — SmP by resampling methods 

Let s be a density in L^(/i). Let Sm be a finite dimensional linear subspace of ^^(/x), let 
(V'A)AeAm be an orthonormal basis of Sm- Let Sm denote the orthogonal projection of s onto 
Sm and let Sm denote the projection estimator of s onto Sm- \\sm — -SmP is a functional of Pn 
and Pg, therefore, it can be estimated by resampling. Indeed, let (Wi^ ...Wn) be a resampling 
scheme and let Wn = XlILi '^i/'^- The resampling estimator of \\sm — Sm|P given by Efron's 
heuristic (see Section 12. 2p is defined for this resampling scheme and a suitably chosen constant 
Cw by: 

Pw{Sm) =CwY.^^ ([(^-"^ - ^nPn)^X?) - (6) 

AeA,„ 
Vw{Sm) is well defined since we can check with Cauchy-Schwarz inequality that 



Pw{Sm) = Cw'^w 



w 

n 

t'^SmM\\<l 



sup (Pf - WnPn)t 



The deviations oipw{Sm) are given by the following theorem. 



Theorem 3.3. Let Sm be a linear subspace of L'^{fi) with finite dimension dm, satisfying H2 
and let C3 > 0. Let Xi, ...,X„ be an i.i.d. sample, let iWi, ■■■Wn) be a resampling scheme and 
let pw{Sm) be the associated random variables defined in ^B^ for Cw = {VariWi — Wn))~ ■ 
There exists a constant k„(Ci,C3) such that, for all 2 < x < C^n/y/d^, for all densities s in 

L2(/,)nL-(/.), 

Ps (\\Sm - Smf > PwiSm) + K,[CuC^){l + yj \\s\\^ ^\\s\\d]i^ ^ dm)^^^\ < 6""/^ 

Comments: 

• This theorem is one of the main contributions of the article. It provides a sharp control 
of the variance term. It is the main difference with the article of Baraud who worked 
in a Gaussian framework and handled this term with a concentration inequality for x^- 
statistics of Birge [?] . Our new construction is more general and can be easily adapted to 
other frameworks, which is not the case in Baraud [1]. 

• It is proved thanks to a technical lemma (Lemma 16. ip and a sharp concentration inequality 
(Lemma I6.2p . Lemma [6?T] shows that, with our choice of Cw, \\sm — SmW^ — Pw{Sm) is 
a totally degenerate [/-statistics of order 2. Lemma 16.21 is a concentration inequality for 
[/-statistics of order 2. 

• The proof of Lemma [6.2l is derived from Houdre & Reynaud-Bouret |15) . it follows mainly 
the one of Fromont & Laurent |llj. The main improvement compared with Promont &: 
Laurent [llj is that we work with general linear spaces Sm- 
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The bound involves a term y^Nloo ^ \/||s|Mm A -v/cl^. From a theoretical point of view, 
the term y||s||dm A y/(Un is useless asymptotically when \\s\\^ is finite. In practice the 



L^-norm of s is often much smaller than its L°°-norm. Moreover, our control can also 
be used when ||s||g^, ||s|| or both of these quantities are unknown, since Kv{Ci,C3) is free 
from ||s||, ||s||j,^. 

• The condition on x is not a problem in practice. We are interested in cases where 1 — e~^'' ^ 
is large, therefore, 2 < x will always be satisfied. Moreover, we will see in Section|3]that the 
assumptions H3(7W, /3) are designed to ensure that the interesting x satisfy x < C^n/^/d^ 
provided that C3 is sufficiently large. 

• This theorem can be used to build a model selection procedure of density estimation. 
Actually, an ideal penalty in this problem is given by 2\\sm — SmW^ and the aim of model 
selection is to evaluate this ideal penalty as precisely as possible. Theorem 13.31 provides 
such a control. This important application is discussed in detail in |24j. For an introduc- 
tion to model selection, we refer to Massart [27]. The concept of ideal penalty is defined 
in Arlot [1] . 

• In order to keep the result as readable as possible, we only give the explicit form of the 
constant HviCi, C3) in the proof of Theorem 13.31 

Corollary 3.4. Let Xi,...,Xn be i.i.d. real valued random variables. Let {Sm)meMn be a 
collection of finite dimensional linear spaces satisfying HI, H2. Let (3 G (0,1) such that this 
collection satisfies also H3(A^,/3) and let M2 > 0, M^o > 0. Let {Wi, ...,Wn) be a resampling 
scheme and let pw{Sm) be the associated resampling estimator defined in Theorem \3.S\ Let 



K^((7i, Cm) be the constant defined in Theorem \cl.°A for C3 = Cmj l^t x„ = 21n (2Nn/f3) V2 and 
let 



Vim,f3,Xi,...,Xn) = PwiSm) + f^v{Ci,CM) (1 + A/00 A M^dU^ A dm) ^^^^ (7) 



Then, for all densities s in L?'{^) n L°^{fj,) such that \\s\\ < M2 and \\s\\^ < Mqo, 

P, (3meM„, \\sm-Smf>Vim,P,Xi,...,Xn)) < ^. 

Comments: 

• This corollary gives a uniform upper bound V{m, /3,Xi, ...Xn) on the variance term. 

• The size of this uniform bound, in the sense of ([2]), is given by the following Theorem. 

Theorem 3.5. Let Xi,...,Xn be i.i.d. real valued random variables. Let {Sm)m.GM„ be a 
collection of linear spaces satisfying HI, H2. Let a, /3 be real numbers in (0, 1) such that 
this collection satisfies also H3(A4,a) and H3(7W,/3). Let M2 > 0, Mqo > and let Vm,p = 
F(m,/3,Xi, ...,X„) be the associated random variables defined in ^. There exists a constant 
K, free from dm, M2, M^o, ol, (3, such that, for all m in A4n, 



'^B2,oo(M2,Moo,0,i2(^))_Q (K^,/3) < K 

Comments: 



^ + ( 1 + ^jM^AM2dU'Adm) ^ In 
n \ ^ In 



Nn 

aj3 



• For fixed confidence level a, /3, the asymptotic order of magnitude of Vm,i3 is dm/n for all 
models with dimension dm > (lnA'"„)^. 

3.3 Estimation of ||s„ — SmW^ 

The simple following lemma is important to understand our procedure. 

Lemma 3.6. Let Xi,...,X„ be i.i.d. real valued random variables with common density s in 
L'^{^). Let Sm C Sn be two linear subspaces of L'^{pL), with respective finite dimensions dm and 
dn. Let Sm o,nd Sn be the orthogonal projections of s respectively onto Sm and Sn- Let (V'A)AeA„ 
be an orthonormal basis of Sn such that (V'A)AeAm is 0,^ orthonormal basis of Sm, with Am C A„. 
Then 



\s -s IP 



AeA„-A 



n Ji-Tn 






Y, {P.i^.f = E, -^— - Y. Y V'a(X.)Va(X,) (8) 



Based on this kind of lemma, Laurent [2H I22J introduced the estimators based on ^/-statistics 
to estimate quadratic functionals of a density. These estimators were successfully used by 
Fromont & Laurent |11] for goodness of fit tests in a density estimation model, and by Robins 
& van der Vaart [28] to build adaptive confidence sets. We follow the same steps here and 
define, for any observation Xi,...Xn, for all finite dimensional linear spaces Sm C Sn, for all 
orthonormal basis (V'a)agA„ of 5„ such that {ipx)xeAm is an orthonormal basis of Sm, with 

1 " 

Pb{Sm,Sn) = -, ^V V MX^)MX,)■ (9) 

^ i^j=l AeA^-Am 



Pb{Sm,Sn) is well defined since we can prove with Cauchy-Schwarz inequality that, if S;^"^ 
denotes the orthogonal of 5^ in 5„, 

Pb{Sm,Sn) = n sup {Pntf - Pn [ SUp ^^ • 

The deviations of pb{Sm, Sn) are given by the following result: 

Lemma 3.7. Let Xi,...,Xn be i.i.d. real valued random variables. Let Sm C Sn be two 
linear subspaces of L'^^^i), with respective finite dimensions dm and dn and let pb{Sm, Sn) be 
the estimator defined in (0j- For any density s in L^(/x), let s„, and Sm denote its orthogonal 
projections respectively onto Sn and Sm- For all C3 > and all e in (0,1), there exists a real 
constant K;,(e,C3) such that, for all 2 < x < Cj,nl \J'd^, for all densities s in L'^{n) n L°°{fi), 
with Fs -probability larger than 1 — 3e~^'^, 



\pb{Sm,Sn) - \\Sn - SmW'^] < epn -5^11'' + K6(e, C3) ( 1 + VlNlloo ^ ll'S|l2'^« ^^ 



n 



Thanks to this Lemma, we can derive the following corollary that gives our estimation of 

IPn •'mil' 

Corollary 3.8. Let Xi,...,Xn be i.i.d. real valued random variables. Let {Sm)meM„ be a 
collection of linear spaces satisfying assumptions HI, H2. Let P be a real number in (0, 1) such 
that this collection satisfies also H3(A^,/3). Let M2 > 0, M^o > 0, x„ = 2ln {6Nn/l3) V 2. Let 
Pb be defined in (0j and, for all e in (0, 1), let K(,(e,Cx) be the constant defined in Lemma \'j. 7 
for C3 = Cm- For all m G Ain, l^t 



/2\ ydnXn qqn 



n 



K{m,f^,X,,...,Xn)= inf ^^^f"" ^"^ + ^'['^ ^^^ (l + ^ M^ A M,dl 
ee(o,i) 1 - e 1 - e \ * 

Then, for all densities s in i?2,oo(-^2, -^00, 0, L^(^)), 

P, (3mGM„, \\Sn- Smf > K{m,p,Xi,...,Xn)) <^. 

Comments: 

• This corollary gives a sharp estimation of the bias term. In particular, we will see in the 
following section that the term \/d^Xn/n is essentially necessary. 

• We obtain a bound valid for all the models in the collection Mn- Combined with Corollary 
13.41 it gives all the tools required to apply our method of selection. 

4 Main results 

4.1 Adaptive Confidence Balls 

We can now easily present our model selection procedure to obtain CS. 



Construction of the adaptive CS 

Let /3 be a real number in (0, 1), let M2 > 0, M^c > 0, let {Sm)m<^Mn be a collection of finite 
dimensional linear spaces and let 5„ = Span (|J^g_;y^ Sm)- Let (V{m, /3,Xi, ...,Xn))meMn be 
the collection defined in ([7]), let {K{m, j3, Xi, ...,Xn))meM„ be the collection defined in p^ and 
let ?7 be a positive real number. For all m in Aim let 



/5(m,7?,/3) = v/??2 + K{m, f3, Xi, ..., X„) + Vim, /?, Xi, ..., X„). 

Recall the definition of the L^-ball centered in an element t of L^ifi-) with radius C in M given 
in ([3]). Our final CS is defined by 

-B/3,r, = 52(sm,/o("i,??,/3),i^(/x)), where m = arg min {p(m,?7,/3)} . (11) 

Performances of our CS 

Theorem 4.1. Let Xi, ...,X„ 6e i.i.d real valued random variables. Let {Sm)meM„ be a collec- 
tion of models satisfying assumptions HI, H2. Let j3 he a real number in (0, 1) such that this 
collection satisfies also H3(A^,/3). Let M2 > 0, M^o > 0, r/ > and let B2^oo{M2, Moo,r], Sn) 
be the ball defined in ^. 

Then -B/3,r?) defined in HI]), belongs to CS{B2^oc{M2, Moo,r], Sn), 13). 

Moreover, there exists a constant k such that for all m in Mn, for all 77^ > and all a such 
that {Sm)m£M„ satisfies also H3(7W,a) 

^B2MM2,Moo,Vm,S,„)ABl3,v) ^ ^ ( Wm + " ) ^ (^ + ' ) 1 • (12) 

Comments: 

• Theorem 14. 1 1 gives CS over -B2,oo(-^2, Moo,t], Sn), with prescribed confidence level /3, valid 
for all n> 2. 

• The size of these CS is upper bounded by the maximum of two terms. ??^ + ^Jd^ijn is 
the minimax separation rate for the tests Kq : s = sq against the alternative Hi : s G 
B2,oo{M2,Moo,r], Sn) — {so}, where sq is some element in S^. r]'^ + dm/n is the minimax 
estimation rate over 62^00(^2, MoojtJmjSm)- 

• Robins & van der Vaart [28] proved that these rates are optimal asymptotically. We will 
show in Theorem 14.21 below that this property holds also non asymptotically. 

• p{m, 7], P) has basically the following form 

p^{m,r],(3) = 7]^ +Pb{Sm,Sn) +PwiSm) + k(M2,Mo< 



2^ fl^ 2 , /e e ^ I fc \ < fi\^ t\4 ^ V(41n(iVn/(a/3)) 

. I /^^ _ „ + pf,{Sm,Sn) + PwiSm) + n{M2,Moo) ■ 

n 



It depends in practice on two unknown constants, rj and k{M2,Moo). We believe that 
some "slope heuristic" (see Birge &: Massart [8], Arlot & Massart [3] or [23]) method 
can be developed for CS in order to obtain a data driven estimate of k(M2,Moo). This 
estimate would probably be more reasonable than the upper bound given in our proof. On 
the other hand, we believe that the constant r] can only be handled with suitably chosen 
assumptions. For example, some regularity assumption as in Section [^^ bellow. 



• Baraud [3] used a procedure almost similar in a regression framework. He defined, for all m 
in M.n, a test Tm to test the null hypothesis Sn G Sm against the alternative Sn & Sn — Sm 
and some positive number p{m). His p{mys are calibrated to satisfy the property that, 
if Tm accepts the null, then, with probability close to one, the distance between s and 
its projection estimator Sm is not larger than p(m). He selected rh as the minimizer of 
p{m) among those m for which T^ accepts the null and defined the confidence ball as 
the L^-ball centered at Sm of radius p{m). The main difference with this general scheme 
is that our procedure does not require a series of tests to work as the bound given in 
Corollarv 13.81 holds for all m. 

4.2 Optimality of our balls 

In this section we prove that the rate given in (J12p can not be improved in general, from a 
minimax point of view. The result is stated in the following theorem: 

Theorem 4.2. Let Sn be the set of histograms on {[k/dn, {k + l)/dn), k = 0, ...,dn — 1} and let 
Sm be the linear subspace of Sn of histograms on {[k/dm, [k + l)/dm), k = 0, ..., dm — 1}. Let 
a, (3 be real numbers in (0, 1) such that 2a + /3 < 1. There exists a constant C{a,l3), such that 

\ n n 

Comments: 

• Theorem 14.21 gives the optimality of the rate given in ()12p , since the terms rj and r/^ can 
obviously not be avoided also. 

• The key point of the proof (Lemma 16. 8p is that we can not build a test of null hypothesis 
Hq : s E Sm against the alternative Hi : s £ Sn, s ^ Sm with separation rate smaller 
than Ca,i3^/(Ui/n. This extends the result of Ingster [161 HTl [T8] to a non asymptotical 
framework and the result of Baraud ^ to density estimation. For a definition of the 
separation rate, we refer to Ingster [T6 t [T7 1 [T8] . 

• The proof follows the methodology described in Baraud [1]. 

4.3 Application to regular density 

This section presents the application of Theorem 14.11 to regular densities. In particular, we 
extend the result of Robins &: van der Vaart [28j since ([1]) is obtained for all n. 

Fourier spaces: 

For all A; in N*, for aU x in M, let 

'tpi,kix) = \/2cos(27r/i;x)/[o,i](x), ip2,ki^) = \/2sin(27r/cx)/[o,i](x). 

For all d in N, let Fa be the linear space spanned by the functions /[o,i]; ''Pi,k, fp2,k, for all k in 
{1, ...,d}. Frf is a subspace of L'^{p). It is a classical result (see for example Birge k. Massart 
[7]) that any sub-collection of (-^d„)o<d,„<ra2(inn)-2 satisfies HI, H2 with Ci = 1. We can also 
easily check that, for all /3 > n^^, it satisfies also H3(A^, /3) with Cm = 4. 

Sobolev Spaces: 

For all functions t in L^{p), let 

to = / t{x)IiQ^i-\{x)dp{x) = / t{x)dp{x) 

JR Jo 
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and for all k eW, let 

ti,k = / t{x)iJi^kix)dn{x), t2,fc = / t{x)'ip2,ki^)d'^i^)- 

Jr Jr 

For all 7 G W^, for all M in R+, we denote by S{'y, M), the set of functions t in L^(^) such that 

It is clear that for all t in 5(7, M), ||t|| < M and for all d in N, if T^Fait) denotes the orthogonal 
projection of t onto F^, 

t2 



i>d ^ ' i>d ^ ' 

We can also use Cauchy-Schwarz inequality to prove that, when 7 > 1/2, for all x in [0, 1], 

\ VieN / ViGN \ ^ ) ) 

Hence, when 7 > 1/2, for all t in 5(7, M), ||t||^ < 2MY/^^^^(ITTp2^. When 7 > 1/2, let 
Moo = 2My^^^gp^(i + 1)~27 and when 7 < 1/2, let Mqo denote a positive real number. We 
have obtained that 

5(7,M,Moo):={tG5(7,M), ||t|L < Moo} C ^2,00 (M, Moo,M(d + 1)-^ F^) . (13) 

Hence, the following proposition holds. 

Proposition 4.3. We keep the previous notations. Let 7, M , M^o be strictly positive real 
numbers, let dn denotes the integer part o/n'^'^"'"^"^ A n^(lnn)"^ and let Mn = {1; ■■■^dn}. 
Let -B^,Af(d,i+i)-^ be the set defined in Theorem \4-l\ for the collection (-Fd,„)d„eA^n- Then, 
Bi3,M(d^+iyi belongs to CS{S{j,M,Moc), ^). 
There exists a constant k free from n such that, for all 7' > 7, 

A5(y,M,M^),a (^AAf(d„+i)-.) < >^ (n-V/(27'+i) v {\nn)n-^^'l^'^+^^) . 

Comments: 

• This result can be compared with the one of Robins & van der Vaart [28j. Our balls 
satisfy the covering property ([1]) for all n and not asymptotically as in their paper. They 
proved that the rate n~^ 1^'^"' '^^i V n'^'^'^ / \'^"i'^^> is asymptotically optimal. 

• It is a straightforward consequence of Theorem 14. H applied with r]m = M{dm + 1)~^ , 
77 = M{dn + I)"''' and the previous computations, therefore, the proof is omitted. 

5 Simulation study. 

In this section, our first goal is to illustrate Theorem 13.31 We proved that the difference 
\\sm — Sm\\2 — Vw{Sm) IS Upper bounded by ^fd^jn, we will show that this bound is sharp 
on some simulations. Then, we will consider a more general version of Efron's heuristics, which 
states that, for a good choice of the constant C^y, the distribution of ||sm — Sm,||2 is close to the 
conditional distribution V^ [Cw Ylxek K-^n^ ~ W^n)V'A]^)- The quantiles of ||sm — Sm.|l2 must 
then be close to their resampled counterpart. In a second simulation, we test this method and 
remark that it gives very good practical results. 

11 



5.1 Illustration of Theorem 13.31 



In this simulation, s is the uniform density on [0, 1], Sm is the set of histograms on the par- 
tition ([(fc — l)/dm,k/dm))k=i,...4m- (Wi, ...,Wn) are Efron's weights, i.e. the distribution 
'D{Wi, ..., Wn) is the multinomial distribution M.{n, 1/n, ..., l/n). In order to compute Pw{Sm), 
we estimate the conditional expectation E^d^AeAtl-^n^ ~ ^n)V'A]^) by a Monte Carlo method 
with nb repetitions. Finally, we repeat p = 1000 times the experiment. We plot the histograms 
of the p values of the normalized difference ?T.(||sm — Sm||2 —Pw{Sm))/V(Un- The first histogram 
is obtained with n = 50, dm = 10, ni, = 100 and the second for n = 200, dm = 50, nb = 500. 
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Figure 1: -^{\\Sm - Sm\\2 - Pw{Sm))- 



Comments: 



The distribution of n(||sm — Sm\\2 ~ Pw{Sm))/V(Un does not change with n or dm- This 
shows that the result of Theorem 13.31 is sharp in this example, at least, up to the constant 
in front of the remainder term. 



5.2 Illustration of the second Efron's heuristic 



In this simulation, we keep the same s and the same resampling scheme. Sm is the set of 
functions constant on the partition {[{k — l)/dm,k/dm))k=i,...,dmJ with dm = 50. n = 100, 
N = 100 and ((-^/)i=i,...,n)j=i,...,Af are A'^ independent samples with common law P^. For all 
J = 1,...,A'', we compute the projection estimator Sm on Sm with the sample (^/)i=i,...,n- 
Then, we take n^ = 10000 resampling schemes (Wi, ...,Wn)- For all resampling schemes, we 
compute the quantity 



^w Vaga 



WnP^)i^x] 



and we obtain an approximation of the (1 — a)-quantiles g^ of its conditional distribution 
V^ {py^{Sm))- We plot the frequency of J such that ||sm — s;^ 
when a varies in (0.5, 1) in the following curves. 



< q'^ and the function /(a) = a 
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0.7 0.8 

level 



Comments 

• The covering property of this empirical ball is very close to the one we would like to 
obtain. Hence, this method seems to give sharp confidence balls for Sm- The computation 
time is the same as in the first method. 

• We do not prove any theoretical evidence of this covering property. In particular, we 
cannot guarantee that Ps(||sm — Sm|l2 ^ Qa) > I — a occurs for any n. 

Acknowledgements: The author would like to thank gratefully Beatrice Laurent and Clementine 
Prieur for many fruitful advices. 

He also would like to thank the reviewers and the associated editors who helped to improve a 
first version of the article. 

6 Proofs. 

6.1 Proof of Theorem [SH 

The theorem can easily be deduced from the following Lemmas, whose proofs are postponed to 
the appendix. 

Lemma 6.1. Let Xi, ...,Xn be an i.i.d sample with common density s inL'^{^) and let {■ip\)x,^/^ 
be an orthonormal system in L^(/i). Let Wi, ...Wn be a resampling scheme, let Wn = n^^ ^^=i ^i 
and let Cw = Var{Wi - Wn)~^ ■ 
Let r,(A) = EasaIV-a - Ps^x?, 



Ps{A) = Y^ [{Pn - PsHx? , PwiA) = Cw^W Y. [(^"^ - ^nPn)^X 



AeA 



^A6A 



1 " 

Us{K) = Y Y^^MXi) - Psi^x){MXj) - Ps^x). 

nin — 1 ^-^ ^-^ 



i=/=j=i AeA 



Then 



1 n — 1 1 1 

Ps{A) = -PnTsiA) + Us{A), pw{A) = -PnTsiA) - -Us{A), p,(A) -pwW = C/,(A). 

n n n n 
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Lemma 6.2. Let Xi, ...,X„ he an i.i.d sample with common density s in L'^ip-) and let {tp\)xeA 
be an orthonormal system in L'^{fi). Let -D^.A = X^aga -^•^ ((^A — -PsV'a)^) ; 

1 " 

Us{A) = -, V TiM^i) - Ps4^x){iPx{X,) - Ps^x), 

n[n — 1) ^-^ ^-^ 
^ ' iy^j=i AeA 

5(A) = J V oaVa; V ai < 1 L vl^ = sup Ps {{t - Ptf) , 6a = sup ||t||^ . 

I AeA AGA J *eB(A) teB(A) 

For all S, in { — 1, 1}, for all x > 0, we have 



P, l^Us{A) > 5.7i;,,A^^^5^ + SvIj,- + 38AV2vs,AbA (-V^^ + 20406^ (-) ) < ee'^ 
\ n ' n \n/ \nJ I 

Lemma 6.3. Let S be a linear space with finite dimension d satisfying assumption H2. Let s 
be a density in L'^{n) nL°°(/i), let {ipx)xeA be an orthonormal basis of S. Let 



5(A) = oaV'a; > ai<U,<A= sup Ps {{t - Ptf) , bA = sup ||t||^ , 
I ASA AGA J *e^(A) *eB(A) 

Ds,A = y,Ps {{i^X - Psi^X?) = Ps { sup (i - Pstf 1 . 
AGA \t<^B(A) J 



We have 



^Ia 



< llslloo A Ci ||s|| Vd, vl^ < DsA <bl< Cld. 



Let us now explain briefly the proof of Theorem 13.31 Let Xi, ...,X„ be an i.i.d sample with 
common density s in L^(/i) n L°°{ii). Let (V^aJagA™ be an orthonormal basis in 5*^. It comes 
from Lemmas 16.11 and 16.21 that, using the notations of these lemmas, for all x > 0, there exists 
an absolute constant k = 2040 such that, with probability larger than 1 — e"^"*"^ 



y^Ds^A^X 2 ^ L /X\3/2 



X 
''A ' 



2 



\\Sm-Sm\\ <Pw{Sm) + t^\Vs,A,^ i^T^ ^ ^^'^- 4^ "^ ^'■'^™^'^™ V n J ^^^ 

'(14) 
Since x > 2, ^/x < x and x — 1> x/2. We have 

(T \ 3/2 f / 'T' \ 2 

Hence, from (|14p . with probability larger than 1 — e~^'^, 



II " ii2 ^ (Q \^ i V^sAm^ , 3^2 p^2 

Since y/d^x/n < C3, dmX^ /n^ < C'^y/dL^x/n, from Lemma lOl 



n 2 *" \n/ 



AC7iP||^/dAC2d+-CiC3^^^^. (15) 
2 J n 

This concludes the proof of TheoremESI with n^ = 2040Ci(l V Ci V 3C1C3/2). 
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6.2 Proof of Corollary ISTH 

We use a union bound to obtain that 

Fs{3m£Mn, \\Sm-Smf > V{m, f3, Xi, ..., Xn)) 

<Nn max Fs {\\sm - s^f > Vim, t3,Xi, ...,Xn)) . 

meMn 

All the models satisfy H2. From assumption H3(7W,/3), Xn satisfies 2 < Xn < Cj,nl\/dm with 
C3 = C;k, thus, from Theorem 13.31 for all ?7z in A^„, 

Finally, Card(A^„)e^^"'^ < ^, which concludes the proof of Corollary 13.41 

6.3 Proof of Theorem 1331 

Let s be a density in L'^{h) n L°°(/x), we only have to prove that there exists a constant k such 
that, with P^-probability larger than 1 — a. 



Vm G A1„, pvK(5'm) < /t ( ^^ + ( 1 + V ll^lloo -^ ||s||d.^^ A dm j In 



Nr,. 



a 



Let (V'A)AeAm, be an orthonormal basis of Sm, from Lemma l6. II and using the notations of this 
lemma, 

PW{A) = -PnTsiAm) - -UsiAm). 

n n 

We follow the proof of Theorem 13.31 From Lemmas 16.21 and 16.31 and assumptions HI, H2, 
H3(A^, a), there exists a constant n such that 



{ 3m e Mn, Us{Am) > i^yM^ A \\s\\dll'^ A d„ 



^ln[Nn/a] 

< a. 



n 



Moreover, it is easy to check, with Cauchy-Schwarz inequality, that, using the notations of 
Lemma 16.31 

T,(A^)= sup {t-Pst)\ 

teB(A,„) 

Hence, using assumptions H2, we obtain 

PnTsiAm) < ||r,(A^)||^ < 2Cfdm. 
This conclude the proof of Theorem 13.51 

6.4 Proof of Lemma 13.71 

Let Xi, ...,Xn be an i.i.d sample with common density s in L'^{ij-) n L^{fi). Let (V'A)AeA„ be 
an orthonormal basis of Sn such that (V'A)AeAm is an orthonormal basis of 5"^,, with A^ C A„. 
The Hoeffding's decomposition of the [/-statistic Pb{Sm, Sn) can be written 

Pb{Sm,Sn) = Us{An-Am) + 2Pni J] (Ps^a)(V'A " ^sV-a) I + J^ {Ps^J^xf 

\AeA„-Am / AeA„-Am 



|2 
'mil ) 
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where, as usually, for all indexes sets A, 

1 " 

It comes from Lemmas 16.21 and 16.31 that, for all 2 < x < C^n/^/d^, 

Ps {\Us{K - A^)\ > ^v{CuC3) (l + ^NL^NM^) ^) < 2e-^/2. 

If Sn = Sm, this concludes the proof. Else, let e in (0, 1), the inequality 2ab < ea^ + e~^6^ gives 

/ / _ \ \ 2 



■^K-'n J'^s) y^n ^m) \ _; £||'5n -SmH + ^ I \^n ^t 



'71 ■'m, 



The function Sm,n = (sn — Sm)/\\sn — Sm\\ Satisfies ||sm,n|| < 1 and, from Bernstein's inequality, 
for all X > 0, 



V n "" 6n J 

Since Sm,n belongs to Sn, which satisfies H2, it comes from Lemma 16.31 that 

Ps [{Sm,n - PsSm.,nf] < \\\s\\oo ^ ^1 INII '^n) ' l|Sn,m|loo ^ C'lV^- 

We conclude the proof of Lemma 13.71 saving that x > 2 implies 2e~^ < e~^". In this Lemma, 
we proved that we can choose ^^(e, C3) = Kt,(Ci, C3) + 2e^^(2 V 2Ci V C'iC'l/^). 

6.5 Proof of Corollary ISTSl 

Let Xi,...,Xn be an iid sample with common density s in B2^oo{M2, M^cO, L'^{^)). Let e in 
(0, 1) and let 0„(e) denote the event 

, ^ ^ \ /V\\ ^ II II jl/2Vd^Xn 1 

A union bound gives that Ps(il„(e)'^) is upper bounded by the sum over M.^ of 



I r o o \ II Il2l ^11 ||2 I r ry \ W 11 a 11 11 il/2 V "n^^n \ 

|Pfe(5m,'S„) - ||s„ - Smil \>e\\sn-STn\\ + Kb(e, 6x) y ||s||oo A ||s|| d„' I. 



Assumption H3(A^,/3) ensures that a;„ satisfies 2 < x^ < C^n/\/d^ with C3 = C^vi, thus. 
Lemma 13.71 gives that this last probability is upper bounded by 3e~^'"''^. Our choice of x„ 
ensures that 3iVne"^"/^ < 13/2 and thus that P^(J^„(e)'=) < |. The proof of Corollary EJ is 
concluded because, on 0„(e), 

(1 - e)\\sn - Sm\? < Pb{Sm, Sn) + Kb{e, Cm)\/ \\s\\^ A ||s|| dl/'^ ^-^. 

6.6 Proof of Theorem 14.11 

The theorem is a straightforward consequence of Corollaries 13.41 and [37 
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6.7 Proof of Theorem HT^ 



We begin the proof with the following proposition, which shows that (j)nia, (3, Sm, Sm) > dm/{^'^n). 
Since (/>n(a, /3, 5"™, Sm) > <Pn{oi, /3, 5"^, Sm), the same bound holds also for (j)n{ct, P-, Sn, Sm)- 

Proposition 6.4. Let S be the set of histograms on the partition, 

'k k + y 



d' d 



, k = 0,...,d-l 



Let Xi, ...,Xn be an i.i.d sample. Let a, (3 be real numbers in (0, 1) such that a + /3 < 1. Assume 
that d > 3 + 181og(\/2/(l -a- j3)), then 



(j)n{a,(3,S,S) > 



d 



The proof is decomposed in two lemmas. 
Lemma 6.5. Let Bp = B2{s, pi^, S) in CS{S,I3) and let p^^/s be a real number such that 

Vs G S, Ps {pi3 < Pa,l3) > 1 - a. 

Then, 



yseS, 



s — s 



> Pa,p) < a + /?. 



(16) 



Proof, of Lemma 



\s-s\\>Pa,l3\ = IPs [||s- s|| > pQ,^npa,/3 > /5^] 

+Ps [||s - s|| > Pa,j3 n Pa,j3 < P/s] 

< Fs [\\s - s\\ > pp] + P, [p^^p <Pi3\<a + p. 



Lemma 6.6. Let 5 = a + (3 and let p^ be any real number satisfying klb]) . Then we have 



U 



Pi> 



d-l 1 



2n n 



\ 



2{d+l)\n 



^l + {d + l)n-^ 
1-5 



Remark: When d > 3 + 181og(-v/2/(l — 5)) and n > d + 1, we have 



\ 



2(d + l)ln 



^l + {d+l)n 
1-5 



-1 



< 



d-l 



3 ' 



thus p1>{d- l)/(6n) > d/{l2n). 
Proof: We prove that if 



Ps 



d-l 1 
2n n 



\ 



2(d + l)ln 



Vl + (d+l)n-i 
1-5 



then 



infl 



\s — s\\ < Pi] <l — 5. 
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Let So = lfo,i)) ^ = {!) •••! ['^/2]} and for all A in A, let 



V';^ 



2 (l[2(A-l)/d,(2A-l)/(i) — l[(2A-l)/(i,2A/d); 



It is easy to check that (V'A)AeA is an orthonormal system in S, orthogonal to sq such that, for 
all A in A, ||V'a|Ioo — yd/^- Let sq = f ssod/j, and for all A in A, let 



s\ 



I 



sipxdfi. 



Let (^A)AeA be independent Rademacher random variables, independent of Xi, ...,Xn, let p be 
some real number to be chosen later and let sg = sq + pX^asA^-**^^- '^^^ ip\ have distinct 
support, thus IIX^AeA IV'a||| < ^/d/2 and s^ is a density if 



'2 2 

a \ a 



Assume that PT|) holds, then 



infP,[||s-s|| <p5] < 
s&S 



s\\ < Ps] 



(17) 



(18) 



We have 



1^5 -sf = (1 + so)^ + ^ (pCa - sa)' 



AeA 



Y, p''-2pCxh + Sl>p''N{^,s) 

AeA, p?asa<0 



(19) 



where N{^,s) = Card({A G A, p^asa < 0}) = EagA 1{p5a^a<o}- If we plug (USD in (HHD, we 
obtain 



infl 



\S - S\\2 < ps] < / ^p2N{^.s)<psS^dp. 

Jo 
We integrate with respect to (^ and we apply Fubini's theorem to obtain 

infP4||s-J||2<p^] <Fs^[p^NiC,s)<pj] =<J E^(l^.^(^_,)<^2S5)dAi. 
From Cauchy-Schwarz inequality. 



% [h'Ni^,s)<pjSi) < n (r^le. ^) < Ps) IE« (4) ' 
and Egs| = Sq + p^ Z^AeA V'v ^^^ ^^^ ^ i"^ ^' /o V'a = 1' ^^^s 

/ K^sjdp = l + p- 
Jo 



(20) 



(211 



(22) 



Moreover, conditionally to s, N{^, s) is a sum of [d/2] independent random variables valued in 
{0, 1}. Thus, from Hoeffding's inequality, 



Vt>0, F^[N{^,s)<E^{N{C,s)) 



t\ <e 



-2t 



(23) 
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In (123D, we have E^ (iV(C,s)) = EagA^ (1?aSa<o) > [d/2]/2 and we choose 

t = In 



VTTTWm 



1-6 



P 



2 2 

- < A/-- 
n \ d 



Since (d - l)/2 < [d/2] <{d+ l)/2 

t < In 
Thus 
Hence, from 



^l + {d+l)/n 
1-5 



E^iN{^,s))> 



d-1 



{p^N{^, s) < pj} C {N{^, s) < E^ (iV(e, s)) - ^/WW}- 



^^(^^^(^'^")^^^)^TT7M- 



(24) 



We plug inequahties ([22]) and dMI in dHJ to obtain 

"1 



eHi 







•^ \'-dpm{^,s)<f 



sA<{i-6r. 



Thus, from (pOj) and Jensen inequahty, 

inf Fs \ 
ses ^ 



\s - s\\2 < ps] < 'i- - S. 



We already know thanks to Proposition 16.41 that (pnice, /3,Sn, Sm) > dm/{l2n), therefore, it 
remains to prove that cpnict, l3,Sn, Sm) > Vdn/n. Let sq = I[o,i], let i?^ = B2{s, Pi3, Sn) be a 
confidence ball in CS{Sn,l3) and let /Jq,,/? > such that for all densities s in Sm, 

Fs (/5/3 < Pa,^) >l-a. 

We will prove that pa^i3 > c\/d^/n, which is sufficient to prove Theorem 14.21 We decompose 
the proof into two lemmas. 

Lemma 6.7. Let Sn{pa,i3) = {t ^ Sn ', \\t — S0II2 > 2/>q,^^}. There exists a test T of null hypoth- 
esis Hq : s = sq against the alternative Hi : s £ Sn{pa,i3) with confidence level more than 1 — /? 
and power more than 1 — a — (3, ie such that 

IP.o(^ = 0)>l-/3, inf P.(r = l)>l-(a + /3). 

Proof, of Lemma [6771 Let T = 1^ ^^ . Since sq belongs to Sn and Bp belongs to CS{Sn, 13), 
P^^(T = 0) > 1 - /3. Moreover, for all s in Sn{pa,i3), 

P,(r = 0) = Fs{soeBp)=Fsi\\so-s\\<pp) 

< P,(||so - 4 - \\S - S\\ < pp) < Fs{\\s - S\\ > 2p„,;3 - /0/3). 

This last probability is equal to 

Ps(||s - S\\ > 2pa,l3 - ppr\pp> Pa,l3) + Fs{\\s - s\\ > 2pa^p - Pp Cl p/^ < Pa,/^) 

< P.(/5/3 > Pa,p) + Ps(||s - s\\ >pp)<p + an 

D 
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The second lemma gives the separation rate for the test of null hypothesis Hq : s = sq 



Lemma 6.8. Let r] = 2(1 — 2a — j3), let p > 0. Let Q^ be the set of tests T^ with confidence 

level a, of null hypothesis Hq : s = sq against the alternative Hi : s £ Snip), where Sn{p) is the 

set of all densities s in Sn such that \\s — sq\\ > p. 

Let /3 (Snip)) = infr^g e^ sup,^s„{p) ^ sj Tg = 0) . 

Lfdn > 10 and p^ < Y^ln(l + 7/2)/3.2(\/(i„ - 1/n) then (3 {S{p)) > /3 + a. 

Comments: From Lemmas 16.71 and 15.81 we deduce that 



2 ^ / ln(l + r?2) ./d~^ ^ Vln(l + t?^) ^/g; 

^-'^ - V ^ 4^^ - n ~- 

Thus the proof of Lemma 16.81 concludes the proof of Theorem 14.21 

Proof, of lemma [6^ The function /? (Snip)) is non-increasing with p. Thus we take 

p2 = v'ln(l+r/2)/3.2Vd„-l/n 

and we will to prove that /3(S'„,(p)) > a + /3. Let p^p be a probability measure on Snip), l^t 
P^,p = J Psdpp. 



HSnip)) > ^inf Pmp 



inf 



{Ta = 0) 

^AT^ = 0) 



> 1 - a + inf 



where 



\TV 



,„(r„ = o) + p,,(r„ = o)) 
V,(T„ = o)-p,„(r, = 0)) 

> l-a- sup |P^^(^)-P,,(A)| 

A-Vso{A)<a 

> l-a-l/2||P^^-P,J^^ 
denote the total variation distance. Assume that 



(25) 



(26) 



respect to P^q. Let L^^ 



dFpJdFso, then 



Mp 



is absolutely continuous with 



Mp 



■ -"^oWtV 



Es,\Lp^iXi,...,Xn)-l\< 



■ so \ ^t^p 



1/2 



and then 



l3{Snip))>l-a 



^so(Ll 



(27) 



Mp 



From (f27|l . /3 (Snip)) > a + /3 if E^q ( L^ ) < 1 + t?^. Let us now give a probability measure on 



Snip), absolutely continuous with respect to Psq, such that E^q ( L^ J < 1 + r/^. 



Mp 

Let (''/'A)A=i,...,[d„/2] be the following orthonormal system. Let ipo = sq, (f) = l[o,i/2) ~ -'-[1/2,1) ^^^ 
for ah A = 1, ..., [dn/2], tpx = i/dn/2^(d„x/2 - (A - 1)). Let ^ = (CA)A=i,...,[d„/2] be independent 
Rademacher random variables and let pp be the distribution of s^ = so+ pY^x=^ ixi'x/ yWnPA- 
Let us check that pp satisfies the required properties. The functions (V'A)A=i,...,[d„/2] have distinct 
support, thus 

[rfn/2] 



Ei^> 



A=l 



< ^/dJ2. 
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s^ is a 



real density if /? < 1. Since 2a + /3 < 1, r/^ < 4 and ln(l + 77^) < ln(5). -v/d^ < n, hence 



Since (V'a)a=i,..,K/2] is an u> 
is a law on Snip)- Moreover 



3.2 n - ' 
orthonormal system, ||s^ — soil = P) thus s^ belongs to Snip) and fip 



^'^Vtt-^ 



Thus 



^ n / [dn/2] \ 

^ ^e{_l,l}[dn/2] a=l \ V^'^"/^J A=l / 

m nrrlpr tn svmnlifv thp nntatinns "urp w/rit.p V^ ^ instparl nf V^ ^ r , ,,rj / 



Hereafter, in order to symplify the notations, we write J^f instead of ^t^r ^ i|[dn/2] and J2) 
instead of Y^xZi ■ Let 0(p,C) = pEa^aV'a/a/KT^I, we have 

1 " 

Ll^ixi,..,Xn) = 22(K/2]) IZn (l + '^(P'0(a^a)) (l + '/'(P,^0(a^a)) • 

For ah A / A' = 1, ..., [d„/2], Va^A' = 0, thus 

For all A = 1, ..., [dn/2] and all q = 1, ...,n, Psf^iipx) = 0, Psq('(/'^) = 1, thus 



-/ ;2 
■ 'A- 



^-(^^.) ^ ^EE(i + [f^E^A^A^ 



5 ?' 

-^ K/2] 

22K/2] Z^ 2^ 2^ 

C «=0 5';Card(A, C;=?;^)=/ 

-^ K/2] 

^p;7^ E ^K/2] 1 + 

/=0 



:+t£M<''-l''"'''l'. 



p22/ „ 

-p 



[dn/2] 



1 + n < e", thus (1 + u)" < e™. Since p^ < 1^ we 

/r r /^l -. N 9 T 



t — u 

For all real numbers u > —1, we have < 1 + n < e", thus (1 + u)^ < 
can apply this inequality to all the ui = (2//[(i„/2] — l)r^ and we obtain 

p'^2n 



1 ^'^"^^^ ( p^2nl \ e~"-P^ ( ( p^2n^ 

IS, E,„ (l2 J < 1 + r,2 if 



+ 1 



[dn/2] 



-np"" + {[dn/2]) In 



-i^)^'>..„a.,^> 
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For all positive u, ln(l + u) < u, thus, we only have to prove that 

2 V PV[dn/2] 

[dn/'A > (dn - l)/2 and dn > 10, thus 



np2 + 1^1^ ("exp (^^-^ ) - 1 ) < ln(l + v')- 



[dn/2] 



P^2n ^ 2 / ln(l + r?2)Vd~^ ^ 4*0.71 ^ ^ 



3.2 [d„/2] - Vrf^T^T 



For all real numbers x in [0, 1], we have e^ < 1 + x + 3.2x^, thus exp (p^2n/([d„/2])) — 1 < 
p^2n/{[dn/2]) + 3.2 {p^n/{[dn/2])f. Hence 



2 , [dn/2] , 
-n/9 H — exp 



P'^""^ -l) <1.6p^n^/{[d^/2])<^;^ln{l+i^^)<ln{l + ri^). 



[dn/2] 



2[dn/2] 



D 



7 Appendix 

7.1 Proof of Lemma 16.11 

Er=i(^i - ^n) = 0' thus, for all A in A, {P^ - WnPn){Ps4^x) = 0. Moreover, since the weights 
are exchangeable. 



= E 



Y^m - Wn 



Ki=l 



Thus, 



Hence, 



= ^ E {{Wi - Wn?) + Y. ^(^* - ^n){W, - Wn) 

= nE {{Wi - Wnf) + n{n - l)^{Wi - Wn){W2 - Wn). 
t;^ = E {{Wi - Wn)^) = -in - l)KiWi - Wn){W2 - Wn). 



pw{^) = Yl 

AeA 



Ew {[{P^ - WnPn)ii^x)]^) _^Ew {[{P^ - WnPn){i^X " Psi^x)]^) 



^W 



E 

AeA 



■^w 



E^H^^ 



1 ^ {W,-Wn){W,-Wn) 



(MXi) - Ps^Px){MXj) - Psi'x) 



AeA 



«J=1 



'W 



PwW 



Z2 E E ^ -2 -iMX^) - Psipx)' 



n 



AeA i=l 

1 



'w 



jj2 /^^ ^__/ 



E{W,-Wn){W^-Wn) 



{i,x{x,) - p,Va)(V'a(^,) - Psi'x) 



xeAif^j=i 

- (P„r(A) - U,iA)) . 
n 



'w 



(28) 
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On the other hand, easy algebra leads to 

\\Sm - SmWl = Y. ([(^" - ^«)(V^a)]') = - (PnT(A) + (n - l)f/.(A)) 



AeA 



Thus, we have \\sm — ■Sm|l2 ~ Pw{-^) = Us{A). 



7.2 Proof of Lemma 16:2] 

We apply Theorem 3.4 in Houdre & Reynaud-Bouret [T5J. For all x > 

^U{A) > — (5.7Bl^/^ + 8B2X + 384Bsx^/^ + lOlOB^x^) J < ee""^. 



(29) 



where 



U(.x,y) = EagaCV-aI^;) - Psi'x){i'x{y) - Psi'x), 



Bl = n^E 



{u{Xi,X2)Y 



Bn = n sup^, E 



{U{x,X2)f , B4 = sup^^yU{x,y), 



n j— 1 



j=l 



52 = sup^ E^J^[/(Xi,X2)a.(Xi)/3j(X2) 
From Cauchy-Schwarz inequality, for all real numbers (feA)AeA 

2 

^ 6i = sup 

AeA \E 






> . 



sup ^ axbx 
>i<iAeA / 



(30) 



In particular, since the system (V'A)AeA is orthonormal, for all x in R, r(A) = (supjg^(^)(t — 
Pst))^. Thus 

\\T{A)\\^<2bl (31) 

Let us now evaluate Bi, i?2, -B3 and B4. 
Evaluation of Bi: 

B^ 



4- = E iPsi{i^x-Psi^x){i^y-Psi^y))y 



A,A'eA 



^ sup Ps {iPx - Psi^x) 



AeA \^K:<^ 



A'eA \A'eA / 



V ( sup Ps ((Va - Psi>x){t - Pst)) ) < i?s,A<A' 



where we use successively the independence of Xi and X2 , Inequality (|30p , the orthonormality 
of the system (V'A)AeA and Cauchy-Schwarz inequality. Thus we obtain 



Bl < nVs,Ay^D:j. (32) 

Evaluation of B2. For all real numbers y, z, we have 2yz < y'^ + z'^, thus, for all i,j in {1, ..., n}, 

2P, ((Va - PsV'A)ai)P. ((Vv - PsMf3j) 

< {Ps ((Va - Psi'x)ai)f + (P. ((V'A' - PsMf3j)f . 
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We apply ([5U|) with bx = Pg {{tpx — Psip\)ai), since the system (^A)AeA is orthonormal, for all i 
in {l,...,n}, 



V (Ps ((V-A - Psi^x)ai)f = I sup Psit - Pst)a, I < v^.^Ps 
ASA V^^W J 



aj. 



Since Yl'i=i ^sOif < 1 we deduce that 

n 

Y, E (^« ((^A - Ps^x)ai)f < 
*j=i AeA 

The same inequality holds for /3j, thus we obtain 

-B2 < nvl^. 



nvli^. 



(33) 



Evaluation oi B^: For all x inM, E[(C/(x,X2))^] is the variance of the function t^; = ^\i^i^{i^\{x)- 
Ps'ip\)'ip\- tx is a function in the linear space S spanned by the (V'a)agA and, from inequality 

dSoD, 



1*0=112 = yZii'xix) - Psi^xf = i sup (t(x) - Pst) ] < 

AeA V*eB(A) J 



26^ 



Thus E[{U{x,X2)f] = Var(t^(X)) = 26^Var(t^(X)/6A) < 26^u2^. Thus 

B3 < \/2nbAVs,A. 
Evaluation of B4: We apply Cauchy-Schwarz inequality and we obtain 

B,<\\T{A)\\^<2bl. 
Let ni be the event defined by inequality i^. From ([32]), ([331), dM]) and ([35]). On il^ 



, , 5.7usAaAD7a^ 8v^^aX ^ 

(Us(A) < ' ^ ' + -^^^ + 384^27; 

n n 



,a6a (-) + 20406a (-^ 



7.3 Proof of Lemma 16.31 
It comes from Assumption H2 that 

It comes from (1301) that 



Ds,A<Y,Psii^l)=Ps 



AeA 



6a < CiVd. 



sup t 

,teB(A) , 



< 



sup t 

ieB(A) 



< Cfd. 



5,A < supteB(A) Pst"^, thus 



2 ^72^- x^2 7 2 ^-iiii iij.ii2 III 

VsA<bA<Cid, v,^<\\s\\^ sup ||t|| = ||s| 



teB{A) 



Finally, for all t in -B(A), 



Pst'^< \\t\LPs\t\ < ||t|LI|t||||s|| <CiVd\ 



Thust;2 . < Ci\/d||s||. 



(34) 
(35) 



24 



References 

[1] S. Arlot. Model selection by resampling penalization. Electron. J. Statist., 3:557-624, 2009. 

[2] S. Arlot, G. Blanchard, and E. Roquain. Resampling-based confidence regions and multiple 
tests for a correlated random vector. In Learning theory, volume 4539 of Lecture Notes in 
Comput. Sci., pages 127-141. Springer, Berlin, 2007. 

[3] S. Arlot and P. Massart. Data-driven calibration of penalties for least-squares regression. 
Journal of Machine learning research, 10:245-279, 2009. 

[4] Y. Baraud. Confidence balls in Gaussian regression. Ann. Statist., 32(2):528-551, 2004. 

[5] R. Beran. REACT scatterplot smoothers: superefficiency through basis economy. J. Amer. 
Statist. Assoc, 95(449):155-171, 2000. 

[6] R. Beran and L. Diimbgen. Modulation of estimators and confidence sets. Ann. Statist., 
26(5):1826-1856, 1998. 

[7] L. Birge and P. Massart. From model selection to adaptive estimation. In Festschrift for 
Lucien Le Cam, pages 55-87. Springer, New York, 1997. 

[8] L. Birge and P. Massart. Minimal penalties for Gaussian model selection. Probab. Theory 
Related Fields, 138(l-2):33-73, 2007. 

[9] T. Cai and M. G. Low. Adaptive confidence balls. Ann. Statist, 34(l):202-228, 2006. 

[10] B. Efron. Bootstrap methods: another look at the jackknife. Ann. Statist., 7(l):l-26, 1979. 

[11] M. Fromont and B. Laurent. Adaptive goodness-of-fit tests in a density model. Ann. 
Statist, 34(2):680-720, 2006. 

[12] C. Genovese and L. Wasserman. Adaptive confidence bands. Ann. Statist., 36(2):875-905, 
2008. 

[13] C. R. Genovese and L. Wasserman. Confidence sets for nonparametric wavelet regression. 
Ann. Statist, 33(2):698-729, 2005. 

[14] M. Hoffmann and O. Lepski. Random rates in anisotropic regression. Ann. Statist., 
30(2):325-396, 2002. With discussions and a rejoinder by the authors. 

[15] C. Houdre and P. Reynaud-Bouret. Exponential inequalities, with constants, for U- 
statistics of order two. In Stochastic inequalities and applications, volume 56 of Progr. 
Probab., pages 55-69. Birkhauser, Basel, 2003. 

[16] Y. I. Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives. 

I. Math. Methods Statist, 2(2):85-114, 1993. 

[17] Y. I. Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives. 

II. Math. Methods Statist, 2(3):171-189, 1993. 

[18] Y. I. Ingster. Asymptotically minimax hypothesis testing for nonparametric alternatives. 

III. Math. Methods Statist, 2(4):249-268, 1993. 

[19] A. Juditsky and S. Lambert-Lacroix. Nonparametric confidence set estimation. Math. 
Methods Statist, 12(4):410-428 (2004), 2003. 

25 



[20] A. Juditsky and O. Lepski. Evaluation of the accuracy of nonparametric estimators. Math. 
Methods Statist, 10(4):422-445 (2002), 2001. Meeting on Mathematical Statistics (Mar- 
seille, 2000). 

[21] B. Laurent. Estimation of integral functionnals of a density. Ann. Statist., 24(2):659-681, 
1996. 

[22] B. Laurent. Adaptive estimation of a quadratic functional of a density by model selection. 
ESAIM Probab. Stat, 9:1-18 (electronic), 2005. 

[23] O. V. Lepski. How to improve the accuracy of estimation. Math. Methods Statist., 8(4):441- 
486 (2000), 1999. 

[24] M Lerasle. Optimal model selection in density estimation. Preprint, 2009. 

[25] K.C. Li. Honest confidence regions for nonparametric regression. Ann. Statist., 17(3):1001- 
1008, 1989. 

[26] M. G. Low. On nonparametric confidence intervals. Ann. Statist., 25(6):2547-2554, 1997. 

[27] P. Massart. Concentration inequalities and model selection, volume 1896 of Lecture Notes in 
Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability 
Theory held in Saint-Flour, July 6-23, 2003, With a foreword by Jean Picard. 

[28] J. Robins and A. van der Vaart. Adaptive nonparametric confidence sets. Ann. Statist., 
34(l):229-253, 2006. 



26 



